An Algorithm for Reducing Text Line Candidates of Incorrect Orientation
نویسندگان
چکیده
Japanese documents often contain both horizontally and vertically printed text lines in the same page. It has been required for document analysis systems to detect correct orientation of text lines and to select text line candidates of correct orientation. We designed an efficient framework for the procedure and developed some algorithms which reduce text line candidates of incorrect orientation. However, our previous system could handle only text lines with slight skew (f 10"). In order to improve the performance of our document analysis system, we recently developed an improved algorithm for text line extraction which can handle curved or wavy text lines as well as straight text lines of arbitrary orientation. We have also developed a new candidate reduction algorithm for arbitrarily oriented text lines. This paper describes mainly the detail of the candidate reduction algorithm. The overview of the system is also mentioned. The candidate reduction algorithm is based on the a priori knowledge that inter-line spacing is much wider than inter-character spacing in most documents. Experimental results show that the algorithm works very well for many documents, including very complicated ones, written in both Japanese and English, and also for the documents which contain curved or wavy text lines.
منابع مشابه
Natural scene text localization using edge color signature
Localizing text regions in images taken from natural scenes is one of the challenging problems dueto variations in font, size, color and orientation of text. In this paper, we introduce a new concept socalled Edge Color Signature for localizing text regions in an image. This method is able to localizeboth Farsi and English texts. In the proposed method rst a pyramid using diff...
متن کاملOrientation and Scale Invariant Text Region Extraction in WWW Images
Text extraction from a web image is important for web indexing because the text can contain a key information of the web. This paper presents a method to detect a text with various orientation and multifont sizes in a web image. The proposed method consists of three steps; 1) color reduction of low resolution web image by a spatial merging algorithm, 2) extraction of character candidates by fin...
متن کاملAn Efficient Coupled Genetic Algorithm-NLP Method for Heat Exchanger Network Synthesis
Synthesis of heat exchanger networks (HENs) is inherently a mixed integer and nonlinear programming (MINLP) problem. Solving such problems leads to difficulties <span style="font-size: 10pt; color: #00...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کامل